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Abstract 

This paper addresses the basic question of how well can a tree approximate distances of a metric 
space or a graph. Given a graph, the problem of constructing a spanning tree in a graph which strongly 
preserves distances in the graph is a fundamental problem in network design. We present scaling distortion 
embeddings where the distortion scales as a function of e, with the guarantee that for each e the distortion 
of a fraction 1 — e of all pairs is bounded accordingly. Such a bound implies, in particular, that the average 
distortion and ^-distortions are small. Specifically, our embeddings have constant average distortion 
and 0( x/log n) ^2-distortion. This follows from the following results: we prove that any metric space 
embeds into an ultrametric with scaling distortion 0(v/l/e). For the graph setting we prove that any 
weighted graph contains a spanning tree with scaling distortion 0(\/l/e). These bounds are tight even 
for embedding in arbitrary trees. For probabilistic embedding into spanning trees we prove a scaling 
distortion of 0(log 2 (l/e)), which implies constant ^-distortion for every fixed q < oo. 

1 Introduction 

The problem of embedding general metric spaces into tree metrics with small distortion has been central to 
the modern theory of finite metric spaces. Such embeddings provide an efficient representation of the complex 
metric structure by a very simple metric. Moreover, the special class of ultrametrics (rooted trees with equal 
distances to the leaves) plays a special role in such embeddings [6, 9]. Such an embedding provides an even 
more structured representation of the space which has a hierarchical structure [6] . Probabilistic embedding 
into ultrametrics have led to algorithmic application for a wide range of problems (see [18]). An important 
problem in network design is to find a tree spanning the network, represented by a graph, which provides 
good approximation of the metric defined with the shortest path distances in the graph. Different notions 
have been suggested to quantify how well distances are preserved, e.g. routing trees and communication 
trees [23]. The papers [3, 12] study the problem of constructing a spanning tree with low average stretch, 
i.e., low average distortion over the edges of the tree, ft is natural to define our measure of quality for 
the embedding to be its average distortion over all pairs, or alternatively the more strict measure of its 
^-distortion. Such notions are very common in most practical studies of embeddings (see for example 
[16, 17, 4, 14, 21, 22]) . We recall the definitions from [2]: Given two metric spaces (X, dx) and (Y, dy) an 
injective mapping / : X — ► Y is called an embedding of X into Y . An embedding is non- contractive if for 
any u =/= v G X: dy (f (u) , f (v)) > dx(u,v). For a non-contractive embedding let the distortion of the pair 
{u,v} be dmt f (u,v)= d ^^K 
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Definition 1 (^-distortion). For 1 < q < oo, define the ^-distortion of an embedding f as: 

dist g (/) = ||dfet/(u,v)||M =Eldist f (u,v)< ! } 1 /< 1 , 

where the expectation is taken according to the uniform distribution U over \ . The classic notion of 
distortion is expressed by the I oo- distortion and the average distortion is expressed by the £i-distortion. 

The notion of average distortion is tightly related (see [2]) to that of embedding with scaling distortion 
[19, 1, 2]. 

Definition 2 (Partial/Scaling Embedding). Given two metric spaces (X,dx) and (Y^dy), a partial 
embedding is a pair (/, G), where f is a non- contractive embedding of X into Y, and G C (^). The 
distortion of (f,G) is defined as: dist(/, G) = sup| u v y e Q dist/(u, v). For e £ [0,1), a (1 — e)-partial 
embedding is a partial embedding such that \G\ > (1 — e)^)- 1 Given two metric spaces (X,dx) and (Y,dy) 
and a function a : [0, 1) — > M + , we say that an embedding f : X —> Y has scaling distortion a if for any 
e G [0,1), there is some set G(e) such that (/, G(e)) is a (1 — e)-partial embedding with distortion at most 
a(e). 

We prove the following theorems: 

Theorem 1. Any n-point metric space embeds into an ultrametric with scaling distortion O(^lfe). In 
particular, its l q -distortion is 0(1) for 1 < q < 2, 0(y/\og n) for q = 2, and CKn 1 " 2 /"?) for2<q< 

Theorem 2. Any weighted graph of size n contains a spanning tree with scaling distortion 0(y/l/e). In 
particular, its £ q - distortion is 0(1) for 1 < q < 2, 0(\Jlog n) for q = 2, and 0{n l - 2/q ) for2<q< X). 

We show that the bounds in Theorems 1 and 2 are tight for the n-node cycle even for embeddings into 
arbitrary tree metrics. We also obtain an equivalent result for probabilistic embedding into spanning trees: 

Theorem 3. Any weighted graph of size n probabilistically embeds into a spanning tree with scaling distortion 
0(log 2 1/e). In particular, its £ q -distortion is 0(1) for any fixed 1 < q < oo 2 . 

1.1 Related Work 

Embedding metrics into trees and ultrametrics was introduced in the context of probabilistic embedding in 
[6] . Other related results on embedding into ultrametrics include work on metric Ramsey theory [9] , multi- 
embeddings [11] and dimension reduction [10]. Embedding an arbitrary metric into a tree metric requires 
f2(n) distortion in the worst case even for the metric of the n-cycle [20]. It is a simple fact [15, 9, 6] that 
any n-point metric embeds in an ultrametric with distortion n — 1. However the known constructions are 
not scaling and have average distortion linear in n. The probabilistic embedding theorem [13, 8] (improving 
earlier results of [6, 7]) states that any n-point metric space probabilistically embeds into an ultrametric 
with distortion O(logn). This result has been the basis to many algorithmic applications (see [18]). This 
theorem implies the existence of a single ultrametric with average distortion O(logn) (a constructive version 
was given in [8]). This bound was later improved with the analysis of [1] as we discuss below. The study 
of partial embedding and scaling distortion was initiated by Kleinberg, Slivkins and Wexler [19], and later 
studied in [1, 2]. Abraham et. al [1] prove that any finite metric space probabilistically embeds in an 
ultrametric with scaling distortion 0(log(l/e)) implying constant average distortion. As mentioned above, 
since the distortion is bounded in expectation, this result implies the existence of a single ultrametric with 
constant average distortion, but does not bound the ^-distortion. In [2] we have studied in depth the 
notions of average distortion and ^-distortion and their relation to partial and scaling embeddings. Our 
main focus was the study of optimal scaling embeddings for embedding into L p spaces. For embedding of 

1 Note that the embedding is strictly partial only if e > l/Q). 

2 Note that probabilistic embedding bounds on the ^-distortion do not imply an embedding into a single tree with the same 
bounds, with the exception of q = 1. 



2 



metrics into ultrametrics, we mentioned that partial embeddings exist with distortion 0{\Jl/e) matching 
the lower bound from [1]. Theorem 1 significantly strengthens this result by providing an embedding with 
scaling distortion. That is, the bound holds for all values of < e < 1 simultaneously and therefore the 
embedding has bounded ^-distortion. It is a basic fact that the minimum spanning tree in an n-point 
weighted graph preserves the (shortest paths) metric associated with the graph up to a factor of n — 1 at 
most. This bound is tight for the n-cycle. Here too, it is easy to see that the MST does not have scaling 
distortion, and may result in linear average distortion. Alon, Karp, Peleg and West [3] studied the problem 
of computing a spanning tree of a graph with small average stretch (over the edges of the graph) . This can 
also be viewed as the dual of probabilistic embedding of the graph metric in spanning trees. Their work was 
recently significantly improved by Elkin, Emek, Spcilman and Tcng [12] who show that any weighted graph 
contains a spanning tree with average stretch 0(log 2 n log log n). This result can also be rephrased in terms 
of the average distortion (but not the ^-distortion) over all pairs. For spanning trees, this paper gives the 
first construction with constant average distortion. 

1.2 Discussion of Techniques 

Theorem 1 uses partitioning techniques similar to those used in the context of the metric Ramsey problem [5, 
9]. However, in our case we need to provide an argument for the existence of a partition which simultaneously 
satisfies multiple conditions, each for every possible value of e. Theorem 2 builds on the technique above 
together with the Elkin et. al. [12] method to construct a spanning tree. A straightforward application of 
this approach loses an extra O(logn) factor and hence does not give a scaling distortion depending solely 
on e. The loss in the Elkin et.al. approach stems from the need to bound the diameter in the recursive 
construction of the spanning tree. In each level of the construction we may alow only a very small increase 
as these get multiplied in the bound on the total blow up in the overall diameter. In their original work [12] 
the increase per level is 0(1/ log n) which translates to the blow up in the distortion. In our case we show 
that the increase can exponentially decrease along the levels. This indeed guarantees a good blow up in the 
overall diameter but is awful in terms of the distortion. We apply a new technique for bounding the diameter 
which allows us to limit the number of levels involved. On the other hand it is clear that for every value of e 
there is a limited number of levels for which the distortion requirement imposes new constraints. The proof 
then proceeds to carefully balance these different arguments. Theorem 3 uses essentially the same ideas 
together with the known probabilistic embedding methods (in fact, the proof of this theorem is somewhat 
less technically involved). The fact that these theorems are tight essentially follows from the results and 
techniques of [1, 2]. 



2 Preliminaries 

Consider a finite metric space (X, d) and let n = \X\. For any point x <G X and a subset S C X let 
d(x,S) = min se s d(x, s). The diameter of X is denoted diam(X) = max^gjf d(x, y). For a point x e X 
and r > 0, the ball at radius r around x is defined as Bx(x, r) = {z £ X\d(x, z) < r}. We omit the subscript 
X when it is clear form the context. Given x £ X let rad^X) = ma,x ye x d(x,y). When a cluster X has a 
center x 6 X that is clear from the context we will omit the subscript and write rad(X) instead of rad x (X). 
Given an edge- weighted graph G = (X, E, lo) with u> : E — > M. + , let (X, d) be the metric space induced from 
the graph in the usual manner - vertices are associated with points, distances between points correspond to 
shortest-path distances in G. 

Definition 3. An ultrametric U is a metric space (U,djj) whose elements are the leaves of a rooted labelled 
tree T. Each v eT is associated a label <f>(v) > such that if u £ T is a descendant of v then $(u) < 
and $(m) = iff u G U is a leaf. The distance between leaves x,y £U is defined as djj(x,y) = $(lca(x, y)) 
where lc&(x,y) is the least common ancestor of x and y in T. 
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3 Scaling embedding into an ultrametric 



Theorem 4. Any n-point metric space embeds into an ultrametric with scaling distortion 0(y / l/e). In 
particular, its £ q - distortion is 0(1) for 1 < q < 2, 0(\/\og n) for q = 2, and Oin 1 ~' 1 l q ') for 2 < q < oo. 

We give the proof for scaling distortion. The consequence of the bounds on the ^-distortion follows 
by a simple calculation. The proof is by induction on the size of X (the base case is where \X\ = 1 and 
is trivial). Assume the claim is true for any metric space with less than n points. Let (X, d) be a metric 
space with n = \X\ and A = diam(A). The ultrametric is defined in a standard manner by defining 
the labelled tree T whose leaf-set is X. The high level construction of T is as follows: find a partition 
P of X into X\ and X2 = X \ X\, the root of T will be labelled A, and its children T\,T 2 will be the 
trees formed recursively from the ultrametric trees of X\ and X 2 respectively. Let u G X be such that 
\B(u, A/2)| < n/2 (such a point can always be found). For any < e < 1 denote by B e (X) the total 
number of pairs (x,y) G X such that dr{x,y) > (150/y/e)dx(x,y). For a partition P = (X\;X2) let 
B e (P) = \{{x,y) I x € Xi Ay G X 2 A d x (x,y) < U/i/150) • A}|. 

Claim 1. Let e G (0, 1] and let (X, d) be a metric space, if for any sub metric X' C X there exists a partition 
P = (X 1 ;X 2 ) be a partition of X' such that B e (P) < e|Ai| • \X 2 \ then B e (X) < e('f 

Proof. Let P = (X 1 ;X 2 ) be a partition of X such that B t (P) < e|Xi| ■ \X 2 \. By induction, 

B e (X) < B e (P) + B c {Xi) + B e (X 2 ) 



2 / V 2 

e/2 (l^il 2 - |Xi| + \X 2 \ 2 - \X 2 \ + 2|Xi| • |X 2 |) 
e/2 (dJfxl + |JC 2 |)(|JCi| + |Jf a | - 1)) 
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So it is sufficient to show that there exists a partition satisfying Claim 1 for all e G (0, 1] simultaneously. 



Partition Algorithm. Let e = max{e G (0, 1 } \ \B(u,y/eA/A)\ > en}. Observe that l/n < e < 1/2 by 
the choice of u . Define the intervals S = [VlA/4, V£A/2], S = [(| + ^)V^A, (| - ^)\/lA], s = ^jV^A, 
and the shell Q = {w \ d(u,w) G S}. We partition X by choosing some r € S such that X\ = B(u,r) and 
A2 = X \ Xx- The following property will be used in several cases: 

Claim 2. \B(u, VeA/2)| < 4en. 

Proof. There are two cases: If e < 1/4 then \/lA/2)| = \B(u, v / 4?A/4)| < 4en (otherwise contradiction 
to maximality of e). Otherwise, e G (1/4, 1]. In such a case \/iA/2)| < \B(u, A/2)| < n/2 < 2en. □ 

We will now show that some choice of r £ S will produce a partition that satisfies Claim 1 for all 
e G (0,32e]. For any r G S and e < 32e let 5 r (e) = (r - VeA/150,r + y^A/150), s(e) = VeA/75, and 
let Q r (e) = {w I d(u,w) G 5 r (e)}. Notice that for any r G S and any e < 32e : SV(e) C S. Define that 
properly A r (e) holds if cutting at radius r is "good" for e, formally: A r (e) iff |Q r (e)| < y^e • e/2 • n. For any 
e < 32e, note that in any partition to X\ = B(u, r), X 2 = X \ X\ only pairs (x, y) such that x, y G Q r (t) are 
distorted by more than 0(y/l/e). If property A r (e) holds then B e (P) < e ■ en 2 / 2 - Since en < |Ai| < n/2 
then e ■ en 2 /2 < en/2|Ai| < e|Xi||X2| so A r (e) implies Claim 1 for e. Hence for e G (0, 32e] the following is 
sufficient: 

Claim 3. There exists some r G S such that properly A r (e) holds for all e G (0, 32e]. 
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Proof. The proof is based on the following iterative process that greedily deletes the "worst" interval in S. 
Initially, let Iq — S, and j = 1: 

1. If for all r £ Ij-\ and for all e < 32e property A r (e) holds then set t = j — 1, stop the iterative process 
and output It- 

2. Let Sj = {S r (e) \ r £ Ij-i, e < 32e, -^A r (e)}. We greedily remove the interval S £ Sj that has maximal 
e. Formally, let rj,tj be parameters such that S rj {tj) £ Sj and tj = max{e | 3S r (e) £ Sj}. 

3. Set Ij = Ij-i \ S rj (ej), set j = j + 1, and goto 1. 

Let Q = {Qr(e)} and note that |Q| = 0(n 2 ) and it is easy to show that for every j £ {1, . . . , t}, Q' £ Q, 
the maximum of {e | S r (e) £ Sj, Q r (e) = Q} is obtained inside the set and can be found in 0(n 2 ) time. 

We now argue that It ^ and hence such a value r £ S can be found. Since for any 1 < j < i < t, 
s ( e j) ^ s ( € i) it follows that any x £ Q appears in at most 2 "bad" intervals. From this and Claim 2: 

t 

J^IQrjfo)! <2|Q|<8cn. 
i=i 

Recall that since (e,-) does not hold then for any 1 < j < t : \Q Tj (ej- ) | > y/Tj ■ 1/2 ■ n which implies that 

t 

i=i 

On the other hand, by definition 

t t 

s ( € i) ^ V^J A / 75 < 12 / 75 • ^ A = 16/100 • \/iA. 

3=1 3=1 

Since s = 17/100 ■ v^A then indeed It ^ so any r £ I t satisfies the condition of the claim. □ 

It remains to show that any choice of r £ S will produce a partition that satisfies Claim 1 for all 
e G (32e,l]. 

Claim 4. If e £ (32e,l], r E S and P = (B(u,r); X \ B(u,r)) then B e (P) < e\Xi\ ■ \X 2 \. 

Proof. Let e G (32e, 1] and fix some r £ S. Only pairs (x, y) such that x £ X\ and y £ B(u,r + y^eA/16) n X2 
can be distorted by more than 16^/1/e and hence may be counted in B e (P). Since vt < -y/e/2/4 and 
r < \/lA/2 then |B(u, r + v^A/16)| < |-B(it, Ve72(g + g)A)| = |B(u, Ve72A/4)| < en/2 by the maximality 
of e. Since |X 2 | > n/2 it follows that S £ (P) < e\X\\ ■ \X 2 \, as required. □ 

Proof of Theorem 1. From Claim 3 and Claim 4, it follows that our partition scheme finds a cut P = 
(Xi;X2) such that B e (P) < e\Xi \ ■ \X 2 \ for all e. Hence when applying the partition scheme inductively, by 
Claim 1 the theorem follows. □ 

4 Scaling Embedding into a Spanning Tree 

Here we extended the techniques of the previous section, in conjunction with the constructions of [12] to 
achieve the following: 

Theorem 5. Any weighted graph of size n contains a spanning tree with scaling distortion 0(*JY]~c). In 
particular, its £ q -distortion is 0(1) for 1 < q < 2, 0(\/log n) for q = 2, and 0(n^ 2 / q ) for2<q< 
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Given a graph, the spanning tree is created by recursively partitioning the metric space using a hierarchical 
star partition. The algorithm has three components, with the following high level description: 

1. A decomposition algorithm that creates a single cluster. The decomposition algorithm is similar in 
spirit to the decomposition algorithm used in the previous section for metric spaces. We will later explain 
the main differences. 

2. A star partition algorithm. This algorithm partitions a graph X into a central ball Xq with center xo 
and a set of cones X\, . . . , X m and also outputs a set of edges of the graph (yi, x\), . . . , (y rn , x m ) that connect 
each cone set, Xi € Xj to the central ball, j/j £ Xq. The central ball is created by invoking the decomposition 
algorithm with a center x to obtain a cluster whose radius is in the range [(l/2)rada; (A) . . . (5/8)rad Xo (-X')]. 
Each cone set Xi is created by invoking the decomposition algorithm on the "cone-metric" obtained from 
Xq, Xi. Informally, a ball in the cone-metric around Xi with radius r is the set of all points x such that 
d{xo,Xi) + d(xi,x) — d(xo,x) < r. Hence each cone Xi is a ball whose center is Xj in some appropriately 
defined "cone-mctric" . The radius of each ball in the cone metric is chosen to be « r fe rad Xo (A) where 
t < 1 is some fixed constant and k is the depth of the recursion. Unfortunately, at some stage the radius 
may be too small for the decompose algorithm to preform well enough. In such cases we must reset the 
parameters that govern the radius of the cones, (in the next bullet, we will define more accurately how 
the recursion is performed and when this parameter of a cluster may be reset). The main property of this 
star decomposition is that for any point x G Xi, the distance to the center xo does not increase by too 
much. More formally, dx u{(j/ i ,x i )}ux i (xo, x)/d(xo, x) < Yij<kO- + r "') wnere & is the depth of the recursion. 
Informally, this property is used in order to obtain a constant blowup in the diameter of each cluster in the 
final spanning tree. 

3. Recursive application of the star partition. As mentioned in the previous bullet, the radius of the 
balls in the cone metric are exponentially decreasing. However at certain stages in the recursion, the cone 
radius becomes too small and the parameters governing the cone radius must be reset. Clusters in which 
the parameters need to be restarted are called reset clusters. The two parameters that are associated with 
a reset cluster X are n = \X\, and A = rad(A). Specifically, a cluster is called a reset cluster if its size 
relative to the size of the last reset cluster is larger than some constant times its radius relative to radius of 
the last reset cluster. In that case n and A are updated to the values of the current cluster. This implies 
that reset clusters have small diameter, hence their total contribution to the increase of radius is small. 
Moreover, resetting the parameters allows the decompose algorithm to continue to produce the clusters with 
the necessary properties to obtain the desired scaling distortion. Using resets, the algorithm can continue 
recursively in this fashion until the spanning tree is formed. 

Decompose algorithm. The decompose algorithm receives as input several parameters. First it obtains 
a pseudo-metric space (W, d) and point u (for the central ball this is just the shortest-paths metric, while for 
cones, this pseudo metric is the so called "cone- metric" which will be formally defined in the sequel). The 
goal of the decompose algorithm is to partition W into a cluster which is a ball Z = B(u, r) and Z = W \ Z. 

Informally, this partition P is carefully chosen to maintain the scaling property: for every e, the number 
of pairs whose distortion is too large is "small enough" . Let A be a parameter corresponding to the radius 
of the cluster over which the star-partition is performed. Pairs that are separated by the partition may 
risk the possibility of being at distance 9(A) in the constructed spanning tree. We denote by B e (P) the 
number of pairs that may be distorted by at least ^l{^JTJt) if the distance between them will grow to A. 
There are several parameters that control the number of pairs in B e (P). Given a parameter n > \W\ which 
corresponds to the size of the last reset cluster containing W, we expect the number of "bad" pairs for a 
specific value of e to be at most 0(e\Z\ ■ (n — \Z\)). To allow to control this bound even tighter we have an 
additional parameter j3 so that the partition P will have the property that B t (P) = 0(e\Z\ ■ (n — \Z\) ■ (3). 
However, if we insist that this property holds true for all e we cannot maintain a small enough bound on 
the maximum value for the radius r. Since this value determines the amount of increase in the radius of the 
cluster, we would like to be able to bound it. Therefore, we keep another parameter, denoted en m . That is, 
the partition P will be good only for those values of e satisfying e < ei; m . 

The radius r of the ball is controlled by the parameters A, 9 and a value a < y/eu m . The guarantee is 
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that r G [OA, (9 + a) A]. Recall that A, corresponds to the radius of the cluster over which the star-partition 
is performed. For the central ball of the star-partition 9 is fixed to 1/2 and for the star's cones is fixed 
to 0. Indeed, as indicated above, the value of en m determines the increase in the radius of the cluster by 
setting the value for a. This cannot, however, be set arbitrarily small, in order to satisfy all of the partition's 
properties, and so ei; m must be set above some minimum value of |W|/(n • (3). Intuitively, we can only keep 
a small if \W\ <C n. 

Let us explain now how the decompose algorithm will be used within our overall scheme. The parameter 
(3 is chosen such that it is bounded by fi k where \x < 1 is some fixed constant and k is the depth of the 
recursion from the last reset cluster. Hence, for every e that is smaller than en m , the property obtained by 
the decompose algorithm is that the number of newly distorted edges is at most 0(e\Z\ ■ (n— \Z\) ■ fj, ), For 
e that are larger than eii m , we show that the number of points in the current cluster is less than an e fraction 
of the number of points in the last reset cluster, hence we can discard all the pairs in such clusters and the 
total sum of all such discarded pairs is small. Therefore, the total number of distorted edges is bounded by 
summing the distorted edges over all clusters, for each cluster depending on whether e is smaller or larger 
than eii m of that cluster. The bound obtained also uses the fact that fj, k is a geometric series. 

Now, if X is not a reset cluster then \X\/n is small compared to the ratio of its radius and the radius of 
the last reset cluster. We show that this ratio drops exponentially, bounded by (|) fc , where k is the depth 
of the recursion since the last reset cluster. By letting eu m = |X|/(n • 0), and as [i < |, we maintain that 
Q! < -y/eiim = r k for some r < 1, as we desired. 

We now turn to the formal description of the algorithm and its analysis. We will make use of the 
following predefined constants: c = 2e, d = e(2e + 1), c = 22, and C = 8\/c ■ c. Finally, the distortion is 
given by C = 150C • d . For any < e < 1 denote by B e (X) the total number of pairs (x,y) G X such 
that dr(x,y) > (C/y/e)dx{x,y). The exact properties of the decomposition algorithm is captured by the 
following Lemma: 

Lemma 5. Given a metric space (W,d), a point u G W and parameters n G N, A > 0, and (3,9 > 0, there 
exists an algorithm decompose((W, d), u, A, 9, n, eii m , (3) that computes a partition P = (Z; Z) of W such that 
Z = B(w.d) ( u i r ) an d r/A G [6, 6 + a] where a = ^/eum/C. Let B e (P) = \{(x,y) \ x<EZ/\y<EZ/\ d(x, y) < 

^ q }\- For n > \W\ and ei; m > the partition has the property that for any e G (0, en m ]: 

B e (P)<e[Z\-(n-\Z[)-[3. 

Star-Partition algorithm. Consider a cluster X with center xq and parameters n, A. Recall that param- 
eters n, A are the number of points and the radius (respectively) of the last reset cluster. A star-partition, 
partitions X into a central ball X Q , and cone-sets Xi, . . . , X m and edges (j/i, x\), . . . , (y m , x m ), the value m is 
determined by the star-partition algorithm when no more cones are required. Each cone-set Xi is connected 
to X by the edge (yi,Xi),yi G Xq, Xi G Xi. Denote by Pq the partition creating the central ball X and by 
{Pi}iLi the partitions creating the cones. In order to create the cone-set Xi use the decompose algorithm 
on the cone-metric £%° defined below. 

Definition 4 (cone metric 3 ). Given a metric space (X,d) set YcX,x€X,y€Y define the cone-metric 
l x y : Y 2 -> R+ as l*{u,v) = \{d(x , u) - d(y , u)) - {d{x,v) - d(y,v))\. 

Note that B(y^(y,r) = {v G Y\d(x, y) + d(y,v) — d(x,v) < r}. 

Hierarchical-Star-Partition algorithm. Given a graph G = (X, E, ui), create the tree by choosing some 
x G X, setting A as a reset cluster and calling: hierarchical-star-partition(X, x, \X\,ra,d x (X)). 

3 In fact, the cone-metric is a pseudo-metric. 
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(X , . . . ,X m , (yx, xx), . . . , (y m ,x m )) = star-partition(X, x , n, A): 

1. Set i = ; = \ ( rad 7 (X) ) V4 ; e lim - \X\/(J3n); A - rad^X); 

2. (X^Yj) = decompose((X,d),x , A, 1/2, e lim , /3); 

3. If Yj = set to = i and stop; Otherwise, set i = i + 1; 

4. Let (a;,;, j/i) be an edge in E such that yi S Xo, G Yj_i; 

5. Let ^ = be cone-metric of xo, Xi on the subspace Yj_i; 

6. (X if Yj) = decompose((yj_i,^),a;i, A, 0,ei im ,/3); 

7. goto 3; 



T = hierarchical-star-partition(X, x, n, A): 

1. If \X\ = 1 set T = X and stop. 

2. (X , . . .,X m , {yi,xi), . . . , (j/ m ,»m)) = star-partition(X, x,n, A); 

3. For each is [1 , . . . , to] : 

4. If < c rada, t then Tj = hierarchical-star-partition(Xi, a?j, n, A); 

5. Otherwise, set Xj to be a reset cluster, Tj = hierarchical-star-partition(Xj, Xj, |Xj|, rad Xi (X)); 

6. Let T be the tree formed by connecting Tq with T, using edge (y,, Xj) for each is [1, . . . , to]; 



4.1 Algorithm Analysis 

The hierarchical star-partition of G = (X,E,co) naturally induces a laminar family J 7 C 2 X . Let Q be the 
rooted construction tree whose nodes are sets in T , F S J 7 is a parent of F' G if f is a cluster formed by 
the partition of i* 1 . Observe that the spanning tree T obtained by our hierarchical star decomposition has the 
property that every F £ T corresponds to a sub tree T[F] of T. Let 1Z C T be the set of all reset clusters. 
For each F S let Gf be the sub-tree of the construction tree Q rooted at F, that contains all the nodes X 
whose path to F (excluding F and X) contains no node in 1Z. For F S T let TZ{F) C 7?. be the set of reset 
cluster which are descendants of F in Qp (These are the leaves of the construction sub-tree Gf rooted at F). 
In what follows we use the following convention on our notation: whenever X is a cluster in Q with center 
point xq with respect to which the star-partition of X has been constructed, we define rad(X) = ra,d Xo (X). 
We first claim the following bound on a produced by the decompose algorithms. 

Claim 6. Fix F S T and Qp. Let X G Qp \ 1Z(F), such that dg(X,F) = k. By our construction, in each 
iteration of the partition algorithm the radius decreases by a factor of at least |, hence rad(X) < rad(_F)-(|) fe . 

Proof. For any cluster F, the radius of the central ball in the star decomposition of F is at most ((1/2) + 
a)rad(-F). Since the radius of this ball is also at least (l/2)rad(F) then the radius of each cone is at most 
((1/2) + a)rad(F) as well. Let Y G K such that X S Qy . Since C = Sy/tTt then a = ^fH^jC = 



Figure 1: star-partition algorithm 



Figure 2: hierarchical-star-partition algorithm 




□ 



8 



We now show that the spanning tree of each cluster increases its diameter by at most a constant factor. 
Recall that d = e(2e+ 1). 

Lemma 7. For every F G T and T[F] C T we have rad(T[F]) < d ■ rad(F)). 

Proof. Let Y € 1Z. We first prove by induction on the construction tree Q that for every X G Gy with 
t = dg(X, Y) we have 

(1) rad(T[X]) < J](l + \{\y) I rad(X) + £ rad(T[i?]) 

J>t \ _RGTC(Y)nSx 

Fix some cluster X G Gy, such that t = dg (X, Y) and assume the hypothesis is true for all its chil- 
dren in Qy- If X is a leaf of Gy then it is a reset cluster and the claim trivially holds (since X G 
1Z(Y) n Gx)- Otherwise, assume we partition X into Xq, . . . ,X m . Let i G [l,m] such that Xj is the clus- 
ter such that Lo(yi 7 Xi) + rad(T[X.;]) is maximal, hence rad(T[X]) < rad(T[X ]) + u){yi,x{) + rad(T[X,;]). 
There are four cases to consider depending on whether Xo and Xi belong to 71. Here we show the 
case of Xo,Xi £ 1Z, the other cases are similar and easier. Using Claim 6 we obtain the following 

bound on the increase in radius: a < l/8y (^£y) 3/4 < l/8(5/8) 3 '/ 8 < 1/8(7/8)'. It follows that 

rad(X ) + u(yi,Xi) + rad(X;) < rad(X)(l + a) < rad(X)(l + 1/8(7/8)'). By the induction hypothe- 
sis we know that rad(T[X ]) < U^t+ii 1 + §(f) J ')(rad(X ) + E Re n{Y)ng Xo rad(T[i?])) and rad(T[X]) < 
rWiC 1 + s(D J ')M(^i) + E R en(Y)ng Xi ™d(T[ii])), hence 
rad(T[X]) < rad(T[X ]) + u(y h a*) + rad(T[XJ) 

< 1] + (rad(Xo)+w(w,a;i)+rad(X,)+ ]T rad(T[i?]) 

i>t+i \ flew(v)nSj» 

^ n c 1 + f ™ipo(i + ^) 4 ) + x; «d(T[i?]) 

j>t+i \ -ReK(y)nSx 

^ n( i +^)' 7 ') f rad w+ e rad ™ 

j>t \ Reiz{Y)ng x 

This completes the proof of (1). Now we continue to prove the Lemma. First, we prove by induction on the 
construction tree G that the Lemma holds for the set of reset clusters. In fact we show a somewhat stronger 
bound. Recall that c = 2e. We show that for every cluster Y G 1Z we have rad(T[F]) < c • rad(F). Assume 
the induction hypothesis is true for all descendants of Y in 1Z. In particular, for all R G TZ{Y), rad (T[i?]) < c- 
rad(i?). Recall that R becomes a reset cluster since rad(i?) < ^jyj \R\, hence Y^r^u(y) rac K-R) < rad(y)/c 
Using (1) we have that 

rad(T[F]) < J](l + I rad(y) + £ rad(T[i?]) 

J>0 \ R£K(Y) 

< (e« E ^ ( J )J )(rad(F) +c-rad(F)/c) 

< e • 2rad(F) = c- rad(Y). 

Finally, we show the Lemma holds for all the other clusters. Let F G J-\ 1Z and Y G 1Z such that F £ Gy- 
Let t = dg(F, Y). Note that E R€ n(Y)ng F \ R \ = \ F \- Sincc F £ n wc have £ ^yT - £ TfT hcncc 

E rad(i?)<^p X |i?l<rad(F). 

fleK(y)nSF i?,6-R(y)ne F 
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By (1) and the second induction we get 



rad(T[P]) < 



< 
< 

proving the Lemma. □ 



We now proceed to bound for every e the number of pairs with distortion f2(y / l/e), thus proving the 
scaling distortion of our constructed the spanning tree. We begin with some definitions that will be crucial 
in the analysis. 

Definition 5. For each e £ (0, 1] and R ell let K{R, e) = {F G Q R \ \F\ < e/c ■ \R\}. 

Hence, a cluster is in K,(R,e) if it contains less than e/c fraction of the points of R. Informally, when 
counting the badly distorted edges for a given e, whenever we reach a cluster in ]C(R, e) we count all its pairs 
as bad. H X £ Gr then let K,(X, e) = K,(R, e) n Gx- For R £ TZ let Gn,e be the sub-tree rooted at R, that 
contains all the nodes X whose path to R (excluding R and X) contains no node in TZ U IC(R, e). Observe 
that Gr.<l is a sub tree of Gr- 

Lemma 8. For any R <G TZ, e S (0, 1] we have that B e (R) < e(?)- 

Proof. Fix some e £ (0, 1]. Fix F £ TZ. In order to prove the claim for F, we will first prove the following 
inductive claim for all X £ Q F . Let t = dg(X, F). Let £(X) = ((f) \ \Jr & k(x) (?) u Uk s k:(x, £ ) (f))- 

(2) B e (X)<yeJ2(9/W-\£(X)\+ E E B ^ K )- 

i>t Reiz(F)ng x KeK{F.t)ng x 

The base of the induction, where X is a leaf inQp, i.e. X £ TZ(F) U K,(F,e), is trivial. Assume the claim 
holds for all the children Xq, . . . , X m of X. Let P = {P;}™ be the star-partition of X, where Pi = (Xj, Yj), 

*i = U™ =4+1 X 3 . Recall the definition of P e (P) = |{(x,y) x £ I, A i; £ 7, A <a;,2/) < tSc?}|, where 
A = rad(A). Denote B e (P) = E™o^( P 0- By Lemma 7 we have that rad(T(A)) < c'rad(A). Hence, the 
number of pairs distorted more than 150C ■ cfy/TJe by the partition P is bounded by P e (P). Now, since 
X $ JC(F,e) then e < c ■ |A|/|P| < 1/(3 ■ \X\/\F\ = eu m . Hence we can apply Lemma 5 to deduce a bound 

on B e (Pi). By Claim 6 we have P = j [ radfn ) — i(f )*^ 4 - From Lemma 5 we obtain 



4(P) - E^( P *) ^ g • e(g) t/4 £ ^ g • ^(9/10) 4 |f (A)|. 

i=0 i=0 



n(l + i(^) J ) rad(P)+ E rad ™ 
j>* \ i?eK(y)ne F 

e • rad(P) + c E rad(P) 

\ fl6R(r)nej; 
e • rad(P)(c + 1) = c' ■ rad(P), 
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Using the induction hypothesis we get that 

m 

B e (X) < B 6 (P) + X> e (Xj) 



< e (9/10) 4 |£:(X)|+^ \j-e\£(X 3 )\ J2 (9/10)'+ E B e (iE) + E B e (#) 



i>t+l 



fl€7?.(F)nex j 



K£K.(F,e)ng Xj 



< ye(9/10Y\£(X)\ + - d .e\S(X)\ E (9/10)*+ E B e (fl) + E B e (#) 

i>t+l R£lZ(F)ng x K£K.(F,e)ng x 



< _. e £(9/10) < |£P0l + ^ 5 £ (i?) + ^ B £ (tf), 



which proves the inductive claim. We now prove the Lemma by induction on the construction tree Q. Let 
FeU. By the induction hypothesis B e (R) < e(f) for every R g K(F). Observe that if K e K(F, e) then 
we discard all pairs in if. Hence £? e (A') < \K\ 2 < % ■ e\F\ ■ \K\. Recall that c = 22. From (2) we obtain 



B e (F) < _. e ^(9/10r-|W|+ e 



i>0 



< 



< 



< 



R£U(F 

'R 



R 



20 1 



ReTZ(F) 



E 2r e|FM ^ 

ifG/C(f,e) 

E |£|.(|i?|-l)/2+-L-|F| £ 



Ren(F) 



KeK{F,t) 



22 \ 2 22 



\f\\ E E 1*1 



R<£TZ(F) 



20 1 



-e-liW-1) 



22 V 2 J 22 

F\ 
2 

where the third inequality follows from the definition of £ (X) and from the fact that for each K € IC(F, e), 
R £ K(F) we have K n i? = 0. □ 

Applying Lemma 8 on the original graph proves Theorem 2. Finally, we complete the proof of Lemma 5 
stating the properties of our generic decompose algorithm. 

Proof of Lemma 5. We distinguish between the following two cases: 

Case 1: \B(u, {9 + a/2)A)\ < n/2. In this case let e = max{e g (0, e lim ] | \B(u, (9 + -^)A)| > e • [3 ■ n}. Let 
S=l(9 + £)A,(9+£)A)^ndS= [(o + $ ( \ + £)) A, (*+^ (§_£)) A 

Case 2: |B(u, (0 + a/2)A)| > n/2. In this case let e = max{e e [0, e Uln ] | |W\.B(w, (9 + a-^)A)\ > e-(3-n}. 



Let 5 = [{6 + a - &)A, (0 + a - |g)A], and S 



We show that one can choose r £ S and define Z = B(u, r) such that the property of the Lemma holds. We 
now show the property of the Lemma holds for all e G (32e, ei; m ] and any r G S. 

Proof for Case 1: In this case we will use the bound: 

(3) B e (P) < \B(u, r + v/iA/(150C)) \ Z\ ■ \Z\. 
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Note that r + v / eA/(150C) < (9 + \/i/(2C))A + y/e~/2A/(8C) < (e+y/e/2/(4C))A, using that e < e/32. 
Now, by the maximality of e we have \B(u, (9 + y/ e/2/(4C))A)| < e/2 ■ (3 ■ n. Therefore, using (3) we 
get 

B e (P) < \B(u,(9 + y/7/2/(4C)A) | -\Z\ 

< (e-f3-n/2)-\Z\<e-/3-\Z\-(n-\Z\), 

using \Z\ < n/2. 
Proof for Case 2: In this case we will use the bound: 

(4) B e (P) < \Z\-\W\B{u,r-V^A/(150C))\. 

Note that r - ^A/(150C) > (6 + a - V~e/(2C))A - y/7/2A/(8C) > {9 + a - y/e~/2A/(4C))A, using 
that e < e/32. Now, from the maximality of e we have \ W\ B(u, (9 + a - y/e/2/(4C))A)\ < e ■ f3 ■ n/2. 
Therefore, using (4) we get 

B e (P) < \Z\-\W\B(u,(9 + a- y/e~/2/(4C))A)\ 

< \Z\-e-(3-n/2<e-(3-\Z\(n-\Z\), 

using \Z\ > n/2. 

We next show the property of the Lemma hold for all e € (0, 32e]. We will prove the claim for Case 1. 
The argument for Case 2 is the analogous. As before we define Q = {w \ d(u,w) € S}. Now we have 

Claim 9. \Q\ < 4 • e ■ (3 ■ n. 

Proof. We have Q C B(u, (9 + a/1/(2C))A). We distinguish between 2 cases: If i < ei im /4 then \B(u, (9 + 
V4l/(4C))A)| < 4e- f3 ■ n (by the maximality of e). Otherwise, e e (eiim/4, eu m ]. In this case \Q\ < \W\ < 
eiim • P ■ n < 4e • (3 ■ n. □ 

As before we will choose some r € S and the partition P will be Z = B(u, r), Z = W \ Z. It is easy to 
check that for any r £ S we get e ■ n ■ (3 < \Z\ < n/2. We now find r G S which satisfy the property of the 
Lemma for all < e < 32e: For any r e S and e < 32e let 5y(e) = [r - VeA/(150C)), r + v^A/(150C))], 
s(e) = v / eA/(75C) and let Q r {f) = {w | c?(m, iu) S ^(e)}. Note that the length of the interval 5 is given by 
s = 17/(100C)v / IA. We say that properly A r (e) holds if cutting at radius r is "good" for e, formally: A r (e) 
iff |Q r (e)| < y/ e ■ e/2 ■ n ■ (3. Notice that only pairs (x, y) such that x, y S Q r {e) may be distorted by more 
than 150Cy/l/e. 

Claim 10. There exists some r € S such that properly A r (e) holds for all e € (0, 32e]. 

Proof. As the proof of Claim 3 goes, we conduct exactly the same iterative process that greedily deletes 
the "worst" interval in S, which are {S rj (ej)}* =1 , and we remain with I t C S. We now argue that /( ^ 0. 
As before wc have X^=i IQrjfeOI — < 8e ■ /5 ■ ri. Recall that since A rj (ej) docs not hold then for any 
1 < j < t : \Q rj (^j)\ > y/ej ■ e/2- f3-n which implies that Ylj=i ^ 12\/I. On the other hand, by definition 

t t 

Yj s ( e i) <YjV i j A /( 75C ) < 12/(75C) • V~eA = 16/(100C) • v^A. 

i=i j=i 

bmcc s = 17/(100(7) ■ \/iA then indeed I t ^ so any r £ It satisfies the condition of the claim. □ 
Claim 10 shows that for any e £ (0, 32e] we have 

B e (P) < e ■ e/2 ■ (n ■ /3) 2 < e ■ f3 ■ \Z\ ■ (n - \Z\), 
which concludes the proof of the lemma. □ 
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5 Probabilistic Scaling Embedding into spanning trees 



The proof of this theorem is based on a somewhat simpler variation of the decomposition algorithm from 
the previous section. In fact, the hierarchical-star-partition algorithm remains practically the same, 
with modified sub-method probabilistic-star-partition given in Figure 3, instead of star-partition. 
Let / : M — > K+ be a monotone non-decreasing function satisfying 



i /0) 

For example if we define log^ n = n, and for any i > define recursively log^ n = log(log^ -1 ^ n), then we 

can take for any constants 9 > 0, t G N the function f(n) = cJ^Cq log^(n) ■ flog^(n)^ , for sufficiently 
small constant c > 0, and it will satisfy the conditions. 



(X , . . . , X t , (yi, x±), . . . , (yt, x t )) = probabilistic-star-partition(X, Xq, A): 

1. Set k = ; A = md X0 (X); a = f(lo J 2A/A)y 

2. Choose uniformly at random (3 G [0, 1/8]. 

3. Let 7 be the value in {0,1/16} minimizing \B(x , (1/2 + 7 + 1/16)A)| - |S(a; , (1/2 + 7 )A)|. 

4. X = B(x , (1/2 + 3 7 /2 + 0/4) A); F = X \ X ; 

5. If Yfc = set t = k and stop; Otherwise, set k = k + 1; 

6. Let w fe G Yfe_i be the point minimizing \ k = — ( |, Y °^ /64) j ; Set % fc = max{4, Xfe}; 

7. Choose r G [oA/16, aA/8] according to the distribution p(r) = 32 '" Xk ■ X^ 2r/{aK) ; 

8. Let (xfc, y k ) be the edge in E which lies on a shortest path from v k to xq such that y k G Xq, £ Y? 



fe-i 



9. Let £ = be the cone-metric with respect to xo and Xk on the subspace Yfc-i! 
X fc = B (Yk _ ue) (x k ,r); Y k = Y k _ t \ X k . 

10. goto 4; 



a By the definition of cone-metric, if z k S Y k —\ all the points on any shortest path from vt to xq arc cither in Xq or in Y k _i 



Figure 3: probabilistic-star-partition algorithm 



5.1 Algorithm Analysis 

Let 7i be the distribution on laminar families induced by the algorithm above. Let TL = supp(Ti). We have 
the following analogs of Claim 6 and Lemma 7. 

Claim 11. Fix J- G Tl, F G T . Let X G Qf \ Tt{F), such that dg(X,F) = k. By our construction, in each 
iteration of the partition algorithm the radius decreases by a factor of at least 5/8. Hence 

rad(AT) < rad(F) ■ (5/8) fc . 

Proof. For any cluster F, the radius of the central ball in the star decomposition of F is at most (5/8)rad(.F). 
Since the radius of this ball is also at least (l/2)rad(F) then the radius of each cone is at most ((1/2) + 
a/8)rad(F) < (5/8)rad(F) as well. □ 
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We now show that the spanning tree of each cluster increases its diameter by at most a constant factor. 
Recall that d = e(2e+ 1). 

Lemma 12. For every T 6H, F G T we have r&d(T[F]) < c' • rad(-F). 

Proof. Let Y £ TZ. We first prove by induction on the construction tree Q that for every X £ Qy with 
t = dg (X, Y) we have 

(6) rad(T[X])< JJ(1 + 1/(8/(1 + j/5))) I rad(X) + ]T rad(T[i?]) 

i>* \ Ren(Y)ng x 

Fix some cluster X £ Qy , such that t = dg (X, Y) and assume the hypothesis is true for all its children in 
Qy. If X is a leaf of Qy then it is a reset cluster and the claim trivially holds (since X £ TZ(Y) n fo)- 
Otherwise, assume we partition X into Xq, . . . ,X m . Let i £ [l,m] such that Xi is the cluster such that 
w(yi,Xi) + rad(T[JQ]) is maximal, hence rad(T[X]) < rad(TLY ]) + uj(yi,x t ) + rad(TLYi]). There are four 
cases to consider depending on whether Xq and Xi belong to TZ. Here we show the case of Xq ,Xi £" TZ, the 
other cases are similar and easier. Using Claim 11 log(2rad(Y)/rad(X)) > 1 + it follows that 

rad(X ) + oj(y l ,x l ) + rad(Xi) < rad(X) (1 + l/(8/(log(2rad(Y)/rad(X))))) < rad(X) (1 + 1/(8/(1 + t/5))) 

By the induction hypothesis we know that rad(T[X ]) < rij>t+i( 1 + 1 /( 8 /( 1 +.?/ 5 )))( rad (^ : o)+Ei?eTC(y)nex Tad ( T l R ])) 
and rad(T[X i ]) < U^t+i^ + 1/(8/(1 +j/5)))(rad(X,) + E Re n(Y)ng Xi rad(T[i2])), hence 

rad(T[X]) < rad(T[X ])+w(w,ai)+rad(r[Xi]) 

< H (1 + 1/(8/(1 + j/5))) I rad(X ) + w(w > aJi)+rad(X i )+ ^ rad(T[i?]) j 

< II (1 + 1/(8/(1 +j/5))) I rad(X)(l + 1/(8/(1 + t/5))) + £ rad(T[i?])j 

< n( 1 + 1 /(8/(l+j/5))) (rad(X)+ ]T rad(T[i?]) 
j>* \ ReK(Y)ng x 

This completes the proof of (6). Now we continue to prove the Lemma. First, we prove by induction 
on the construction tree Q that the Lemma holds for the set of reset clusters. In fact we show a somewhat 
stronger bound. Recall that c = 2e. We show that for every cluster Y £ TZ we have rad(T[Y]) < c • 
rad(Y). Assume the induction hypothesis is true for all descendants of Y in TZ. In particular, for all 
R £ TZ(Y), rad(T[i?]) < c • rad(i?). Recall that R becomes a reset cluster since rad(i?) < ^jy? \R\> hence 
J^ReiziY) ra d(-R) < rad( Y)/c. Using Equation 6 and then Equation 5, we have that 

rad(T[Y]) < + 1/(8/(1 + j/5))) I rad(Y) + £ rad(T[i?]) 

j>o \ Ren(Y) 

< ( e V8Ej>o V/Ci+J'/s))^^) + c . rad(Y)/c) 

< e 5/8 ■ 2rad(Y) =< c • rad(Y). 

Finally, we show the Lemma holds for all the other clusters. Let F £ T\TZ and Y £ TZ such that F £ Qy. 
Let t = dg(F, Y). Note that Y,Ren{Y)ng F \ R \ = \ F \- Sincc F i n wc have £ ^yp - £ W" i hcncc 

E rad(i?)<^p £ l^l<rad(F). 

i?,e-R(y)nSF i?,6-R(y)ne F 



14 



By (6) and the second induction we get 





rad(T[F]) < + l/(8/(i/5))) [ rad(F) + ]T rad(T[i?]) 

< E 

1) = c'-rad(F), 

proving the Lemma. □ 

For any i > let be the distribution on laminar families induced by i iterations of our probabilistic 
hierarchical-star-partition algorithm. Let H [i) = supp(H {i) ). Given G H {i) . Let C/W be the 
corresponding construction tree of T^. Given J 7 ^ , for any x € X let Fi(x) be the leaf in containing x. 

Given x,y S G and j > define events C,Cbaii, <%\ y, Z as follows: 

• Let C(x,y,j) be the event that there exists i > and J 7 ' 1 - 1 € T^ 4 * 1 such that the following holds: 

1. (§)' < rad(i^(x)) < (8/5p" +1 . 

2. B(x,d{x,y)) C 

3. B(x,d(x,y)) £Fi{x). 

• Let Cbaii(£, j) be the event that there exists i > and <G Ti^' such that the following holds: 

1. < Tad(Fi(x)) < (8/5y +1 . 

2. Xo = Bp i ^{xo 1 r) and r chosen as in the algorithm. 

3. B(x,d{x,y)) m (X ,Fi{x)\X ). 

4. B(x,d(x,y)) C F i+1 (x). 

Notice that by Claim 11 for each !F, the first property holds for at most one value of i, denote this 
value by 

• Let X(x, y,j, Z) be the event that there exist i > and g such that the following holds: 

1. (§)'" < rad(i^(x)) < (8/5K+ 1 . 

2. Z = Bp.( x }(xo, r) and r chosen as in the algorithm. 

3. B(x,d(x,y)) C Z. 

• Let y(x,y,j,Z) be the event that there exist i > and J 7 *- 1 - 1 € such that the following holds: 

1. (IY < rad(Fi(x)) < (8/5P+ 1 . 

2. Z = Fi{x) \ Bp.( x -)(xo,r) and r chosen as in the algorithm. 

3. B(x,d(x,y)) C Z. 

• Let Z(x,y,j,Z) =y(x,y,j, Z)U X(x,y,j, Z). 

We omit the parameters x,y,j (or part of them) from C,Chaii, X,y, Z when clear from context. Here 
is an informal description of events C, Cbaii, X, y, Z. Fix x,y and let B = B(x,d(x,y)). Event C(j) is the 
event that the first time that B is cut is when the parent cluster has radius ss (8/5y . Event CbaiiO) is the 
event that the first time that B is cut is by the central ball given that that the parent cluster has radius 
~ (8/5y, observe that Cbaii(j) ^= C(j). Event UzZ(j, Z) is the complement of C(j). For each Z, event 
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Z(j,Z) = y(j,Z) U X(j,Z); Event X(j,Z) (respectively y(j,Z)) is the event that B is contained inside 
(respectively, outside) the central ball of a cluster whose radius is « (8/5) J '. 

For each cluster we define the depth of its local density change as a function of the ratio between its 
radius and its parent reset radius. The parent reset cluster Yi(x) of a cluster Fi{x) is defined as follows. For 
any i > if Fi (x) € 1Z let Yi(x) = Fi(x), otherwise let Yi(x) € 7Z such that Fi(x) G Gy^x)- The depth of the 
local density change is defined as 

Definition 6. Let 

oti(x) 



/(log(2rad(y j (x))/rad(F l (x)))) 
notice that a is a uniform function over T , i.e. ifu,v£ Fi(x) then cti(u) = cti(v). 

Given this parameter we define the local density of a node x in a subgraph Yq as 

Definition 7. Let 

I .x \Yo\ 

PYo{X ' l) |By (x, ai (x)rad(F t (x))/64)| " 

We shall use the following Lemma from [2] 

Lemma 13. Let (X,d) be a metric space and Z C X. let \ > 2 be a parameter. Given < A < diam(Z) 
and a center point v G Z , there exists a probability distribution over partitions (S, S) of Z such that S = 
B{z,d){ v ) r )> an d r ^ chosen from a probability distribution in the interval [A/4, A/2], such that for any 
9 G (0, 1) satisfying 9 > x _1 , let V = jq m (V^)/ m X then for any x G Z, the following holds: 

Pr[B z (x,r)A)>i(S,S)] < 

(1-9) [Pt[B z (x, V A) £S}+2 x ~ 2 ] .. 

Given this lemma we prove a variant of the Uniform Padding Lemma of [2] that is tailored to the 
construction of our algorithm. There are three main differences. The first difference is that instead of 
cutting balls we cut cones, the second difference is that the parameter of the cut is defined in a subtle 
way with respect to the last reset cluster: the local density change of a node is defined as py a {x,i) which 
depends on ai(x) which depends on rad(l r i(x))/rad(i 7 i(a:)) and Equation 5. The final difference is that the 
hierarchical scheme ensures the relation < c rad(y'(-cjj • 

Lemma 14. For all Y C X, x,y G G e and j > such that d(x,y) < (|)V(32/(log(2c/e))): 

Pr\C(x y 4) I y(x y j Y )] < ^(x,y) ■ f(log(2c/e)) f \M 

P r[ L(x,y,j)\y { x,y,j,Y Q )\< (g/5) . ln {\B Yo (x, (8/^ /(61f(lo g (2c/e))))\ 

Proof. Fix Y C X, x, y G G e and j > 0, such that d(x, y) < (8/5) J '/(32/(log(2c/e))). Let J 7 ^ be any partial 
laminar family consistent with the event y(x,y,j,Yo), hence i = ij(F^ 1 '). 

Now we bound the probability that B(x,d(x,y)) F,+i(x) given T^ % > and that the central ball Xo is 
disjoint from B(x,d(x,y)). 

From B(x,d(x,y)) C Fi(x) follows (^(x)! > en. We know by the construction and definition of reset 
clusters that jy^jj < c rad(y;(ajj hence 2rad(Y l )/rad(F i (x)) < 2c/e which implies that a^x) > f(j J 2c / e \\ ■ 

Let A = aj(x)rad(i 7 ' 1 j(x))/4 . For k > 1 let Vk, %k, Xk and Xk be as in the algorithm, and let I}, be the 
appropriate cone-metric. 

Let S - S • v - c;:d f 28 d(x,y)-f(ios(2c/e)) , , f JYoJ "\ \ Tf 5 < e" 1 then the 

claim is trivial (probability is always bounded by 1), so the interesting cases are when S > e _1 . Let 9 = 8 1 / 2 . 
Note that 9 > 1 as required (the algorithm actually apllies Lemma 13 on (Yk,£k) with Xk as center and 
the parameter Xk)- 

First consider the case that /5y (x,i) < 2, then we claim that By {x, e?(x, y)) cannot be cut by a cone: 



16 



Since v\ was chosen as to minimize py (z, i) then p Yo (vi, i) < 2 as well. It implies that both 
|By (ui,A/16)|, \By {x,A/l6)\ > \Y \/2, hence By (vi, A/16) n.By 0, A/16) ^ 0, therefore d{x,v x ) < A/8. 
Since d(x,y) < A/8 and £i(vi,x x ) = follows that B Yo (x,d(x,y)) C S/y ^jfari, A/4). 

Now assume that py (x, i) > 2. We now claim that for all x G ifc-i, ?7fcA > d(x,y). Recall that 
rjk = 2~ 4 ln(l/6»)/lnxA; = 2" 5 ln(l/S)/ Inxk, and notice that if x G Yfc_i then py (x,i) > Xk- If Xfc < 
4 then Xfc = 4 and \ogp Yo {x,i)/\ogXk > 1/2, otherwise Xfc = Xk and logpy (a;, i)/ logx fe > 1. Since 
a J (a;)rad(^(x)) > (8/5)V/(log(2c/e)) we get: 

* > 2 s d(x,y)f(\og(2c/e))-logp Yo (x,t) (8/5)' 
% " 25(8/5)^ log** ' 4/(log(2c/e)) " d[X ' y) - 

It remains to show that if x G Xk then 
Pr[By (x, VkA) £X k ]<l-S< * a «*«>nWcM) . ^ ( |Byo( ^ (8/5)J lg| /(log(2e/e))))| ) as required. 

Consider the distribution over partitions of Y into cones X l7 X 2 , ■ ■ ■ X t as defined above. For 1 < m < t, 
define the events: 

Z m = {Vk, 1 < k < m, B Yo (x, Vk A) C Y k } 7 

£ m = {3k, rn < k < t s.t. B Yo (x,rj k A) ex (X k , Y k )\Z m }. 

We prove the following inductive claim: For every 1 < m < t: 
(7) Pr[S m ] < (1 - 0)(1 + 0E[£ Xk'lZm]). 

k>m 

The proof is essentially the same as the one in [2] . 

Note that Pr[£ t ] = 0. Assume the claim holds for m + 1 and we will prove for m. Define the events: 

T m = { B Yo (x, r] m A) Cxi (X m , Y m )\Z rn }, 

Q m = {B Yo (x,r] m A) C Y m \Z m } = {Z rn+1 \Z m }. 

First we bound Pr[.F m ]. Assume first a particular choice of the cones X\, . . . X m -\ such that event Z m occurs. 
Call this specific event A, then given that A occurred the point v m is now determined deterministically, and 
so is the value of Xm- Now, applying Lemma 13 we get 

Pr[By (x,?7 m A) ex {X m ,Y m )\A} < 

(1 - e)(Pi[B Yo (x, Vm A) £ Y m \A] + Ox™- 

It follows that 

Pr[f m ] < (1 - 6){Vx[§ m ] + ffE\x^\Zm]). 
Using the induction hypothesis we prove the inductive claim: 

Pr[£ m ] < Pr[F m ] + Pr[g m ] Pr[E m+1 ] 

< (i-e)(Pr[g m ] + eE\ x - 1 \z m }) + 

Pr[0 m ].(l-0)(1 + 0E[ Xl^Zm+i]) 

k>m+l 



k>rn 



Now consider a fixed choice of star-partition {Xq, . . . , X t }. Since the radius of every cone is at least A/4, 
and since for every k G [t], £ k (v k ,x k ) = we get that B^ Yo ^(v k , A/16) C B( Yot e k )(xk, A/4) C X k . Therefore 
if k ^ k' then B (Yo>d) (v k , A/16) n Bpr ,d)(vv, A/16) = 0. Hence, we get: 

-i . -i \ B (Yo,d)(vk, A/16)| 

2^ x k < }^ Xk =2^ < i • 

k>m k>rn k>m 
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We conclude that if a; € X m 

PrtBy^x^A) £ X m ] = Pr[fi] < 

(l-9)(l + 6- E[£ XZ 1 ]) < (1 - + 0) = 1 - 6. 

k>l 

Since y(x,y,j, Yo) we have that B(x,d(x,y)) C Y , hence indeed Pr[B(x, j/)) X m ] < 1 — <5. 

□ 

We complete the algorithm analysis be proving that the expected distortion is scaling. As with many 
partition based schemes that use local density, the core argument is essentially based on the observation 
that the scries J2 a<i<b \og r^^^rji is a telescoping series hence it can be bounded by log jg^'g-jj ■ When 
\B(x, 2 a )\ > en and b is large enough then this argument gives the essential 0(logl/e) scaling ingredient. 
The following is a technical generalization of this core idea. The main problem is that the local density 
change Py ( x , i) of our algorithm is defined as a function of Yq, but Yq is determined by a probabilistic 
processes. Hence in order for the core telescoping argument to work we need to delicately combine the 
various probabilistic events in a hierarchical manner. This is done by induction. 

Lemma 15. For any x,y G G t we have 

E[d T (x,y))<6(log 2 (l/e))d(x,y). 

Proof. Foranye > fix some a:, y G G e . Let £ be the smallest integer such that d(x, y) < (8/5)^/(64/(log(2c/e))), 
and let \L = log( 8 / 5 ) diam(X)]. For ease of notation for any j > writing E Zj means that the expectation 
is over clusters Zj such that (I)-? < r&d(Zj) < (8/5y +1 that contain B(x,d(x,y)) C Zj whose distribution 
is induced by the hierarchical probabilistic star partition algorithm. 

Let k — 2 4 c • log( 8 / B ) log(l/e). First we prove by induction on j > i + k the following claim: For any 
m G [£, j — 1] let h = max{m + 1, j — k + 1}, then for any Zj C X: 

i-i 

(8) E ^ t pr P( m ) i Z ( Z ^ i Z (^)i • ( 8 / 5 ) m 

m=l 

3 

< 2 10 c-d(x,y)f(log(l/e)) ^ E Zi 

i=j-k+l 

The base cases when j = I + k is proved similarly to the induction step and we leave it for the reader. 

Assume the claim holds for j and prove for j + 1. Fix any Zj+i C X, for abbreviation let Bz(x) = 
Bz(x, (8/5)-'/(64/(log(2c/e)))). Let pj be the probability that B(x,d(x,y)) C Xj, where Xj is the central 
ball in the star partition of the cluster Zj + i. Consider first the last element in the summation: 

(9) Pr [C(j) | Z(Z j+1 )].(8/5y 

< Pr [C ball (i) | Z(Z j+1 )} (8/5Y + (1 - Pj )E Yj [Pr[C(j) I y(^)](8/5) j I .. 

Consider the term Pr [Cbaii(j) | We choose the radius r of the central ball to be in the "sparsest" 

of two disjoint strips around x$: (1/2, 9/16)rad(Zj + i) and (9/16, 10/16)rad(Z J+ i), hence only one of them 
can contain more than half of the points in Zj + \, and we will choose r from the other one, which contains 
less than half of the points. 

Moreover, the radius is actually in a sub-strip - i.e. in the interval (1/2, l/2 + l/32)rad(Z_j +1 ) or in (1/2 + 
3/32, 1/2 + l/8)rad(Zj'_|_i). Hence if B(x, (x)rad(Zj_|_i)/64) intersects one of these sub-strips, it will be 
fully contained within the appropriate strip (recall that a < 1), which suggest that if the B(x, rad(Zj+i)/64) 
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can be cut by the central ball, it contains less than half of the points in Zj+x, i.e. pz j+1 {x,ij) > 2. We 
conclude that 

(10) Pr[C baU (j) \Z(Z j+1 )](S/5)1 



< 



2d(x,y) 



rad(Z J+1 )/32 (8/5) ' 
< 2 6 d(x,y) ■ pz j+1 (x,ij) 



< 



< 2 9 c-d(x,y)f(\og(l/e)) 
\Zj+i\ 



Pj ■ ^ 



hi 



\BxAx) 



I Z(Z j+1 ) 



+ (l-Pj)E n 



hi 



\z. 



3 + 1 \ 



\ByAx) 



I Z(Z j+1 ) 



In the third inequality we used that (XiAx) > l//(log(2c/e)), and in the last inequality we simply reduced 
the size of Bz j+1 (x) and added expectations. 

As for the term Ey. [Pr[C(j) | y(Yj)](8/5) J ' \ Z(Zj + i)\ , we apply Lemma 14 which suggests that for any 
Yi C Z j+1 it is bounded by 2 9 c • d(x,y) ■ /(log(l/e)) • E Yj [in (Jj^t) I . 

Now consider the reminder of the sum, let h' = max{?7i + l,j — k + 2}. Since for any m G [I, j — 1], 
E Zh [- | = Ez,(E z ,[- I Z{Z,)\ | we get that 



j'-i 



(11) 



2 E^, pr[C(m) | Z(Zv)] | -2(Z J+1 )] • (8/5)' 



= E 2 



£ E Zh pr[C(m) | 2(^)] | Z(Z,)] • (8/5) m | Z(Z J+1 ) 



^z h pr[C(m) | Z(Z,)] | ■ (8/5) m | Z(Z 3 - +1 ) 



+ (1- Pj )-E yj 



J2 ®z h pr[C(m) | Z(Z,)] | • (8/5) m | Z(Z i+1 ) 



Notice that ft,' was changed to h, meaning that we added expectation over level j — k + 1 as well, this 
docs not change the value of the expression. Applying the induction hypothesis to Equation 11 yields 

i-i 

(12) J2 E ^ t pr i Z (^')i i z (^+i)] • ( 8 / 5 ) m 

m=l 

< 2 10 c-d(x,y)f(log(l/e))p r E Xj 

+2 10 c ■ d(x, y)/(log(l/e))(l - Pj )E Yj 



E E ^ 

i=j— fe+1 



E E ^ 



In (1^1) 
1 en ' 



I 2(^+1 ) 

I Z(Zj+i) 



We now have all the ingredients to prove the inductive claim of Equation 8. For abbreviation let W = 
2Vd(x,2/)/(log(l/e)). 
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£ E Zh , pr[C(m) | Z(Zv)] I Z(Z j+1 )] ■ (8/5) 
( \Zj+i\ 



< W ■ \pj- E X] 



In 



+W-(l- Pj )E Y: 
+2W- [ pj -Ex, 



Vl^(x)| 



hi 



Z(Z J+1 ) 

2(^ + i ; 



(l-Pi)Ey, 



In 



r is 



j+ii 



E E ^ 



( — ) 



W-pj-Ex, In 
+2W-(l-pj)E yj 



E 

i=j-fc+l 



2(^+1 ) 

I 2(^+1 



l^+i I 



E ^ 

i=j-k+l 



ln[^)\X(Xj, 



In 



V|Sy 3 (x)| 



E 

i=j-fc+l 



(^ ) 



Z(Z i+1 ) 

Z(Z j+1 ) 



+W- Pj -E x , 



E Ez; 

i=j-k+i 



(f) 



3+1 

< w- Pj Ez 

i=j-k+2 

i+i 

• Pi ^ E z , 

i=j-k+2 



3 + 1 



2W-(1- Pi ) ^ E ^ 

j=j-fc+2 



In if] |2(Z j+1 , 



half) |Z(Z, +1 ) 



C77 



i+i 



< 2 10 c-d( a ;,y)/(log(l/e)) £ E Zi 



hi 



\Zi[ 

en 



The first inequality follows from Equation 9, Equation 10 and Equation 12. The second equality is just 
a re-ordering of terms. The third inequality is the telescope argument, it holds since for any choice of 
Xj C Z j+ i, and any choice of Zj-k+i C Xj by definition rad(Z,-_ fc+ i ) < (5/8) fc_1 rad(X,) < 2 ^ f(it^2c/e)) , 
since x £ Zj-k+i follows Zj—k+i Q Bxj(x). The argument for Yj is similar. So the elements depending on 
Xj and Y7 cancel out, and we don't need the expectation on Xj and Yj anymore. 
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Let j = L + 1, Zj = X, then applying Lemma 12 and Equation 8 completes the proof. 

L 



nd T (x,y)] < ^Pr[C m ]-2rad(T[^ m (x)]) 

TCI — 1 

l-\ L 

< 4c' J2 (8/5) m + 4c' J2 ^ [Pr[C m | Z(Z h )} \ Z(X)} (8/5)' 



< 4c'(5/3)(8/5/ + 2 12 c.c'.d(.T 72 ;)/(log(l/ e )) £) E Zi 



=L-fc+l 

< 8c' • d(s, y)64/(log(2c/e)) + 2 12 c • c' • d(x, y)/(log(l/e))2 4 c • loglog(l/e) \n{n/{en)) 
= 0(log 2 (l/c)) 



In 



en 



□ 
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